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ABSTRACT 

With the growing use of location-based services, location privacy 
attracts increasing attention from users, industry, and the research 
community. While considerable effort has been devoted to invent- 
ing techniques that prevent service providers from knowing a user's 
exact location, relatively little attention has been paid to enabling 
so-called peer- wise privacy — the protection of a user's location from 
unauthorized peer users. This paper identifies an important effi- 
ciency problem in existing peer-privacy approaches that simply ap- 
ply a filtering step to identify users that are located in a query range, 
but that do not want to disclose their location to the querying peer. 
To solve this problem, we propose a novel, privacy-policy enabled 
index called the PEB-tree that seamlessly integrates location prox- 
imity and policy compatibility. We propose efficient algorithms 
that use the PEB-tree for processing privacy-aware range and /cNN 
queries. Extensive experiments suggest that the PEB-tree enables 
efficient query processing. 

1. INTRODUCTION 

We are experiencing an increasing availability of location-based 
services such as AT&T's TeleNav GPS Navigator, Sprint's Fam- 
ily Locator, and Intel's Thing Finder. A key obstacle to the broad 
adoption of location-based services is the lack of location privacy 
protection [2,20,30]. 

Specifically, in a setting where a service provider serves mul- 
tiple users, a user may have privacy concerns with respect to the 
service provider as well as the other service users. As an exam- 
ple of the first case, a user may worry that the service provider 
will disclose the user's locations (e.g., the user's daily route) to 
malicious parties. We use provider-wise privacy for privacy in re- 
lation to the service provider. As an example of the second case, an 
employee may not want work colleagues to know his/her location 
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during lunch if he/she is outside the company building. This type 
of access restriction may also prevent stalking or other personal 
security threats [24, 34]. We use peer-wise privacy for privacy in 
relation to peer users. 

Most research on location privacy thus far has been devoted to 
provider- wise privacy, and techniques such as spatial cloaking [10, 
36], location distortion [18], and encryption [9] have been explored. 
In relation to peer- wise privacy, only a simple filtering approach has 
been employed. 

The setting of the filtering approach is one where users specify 
their privacy preferences using location privacy policies that cap- 
ture who is allowed to see the location of who and under what con- 
ditions. To answer a peer- wise privacy-aware query, the filtering 
approach first finds users who satisfy the query's location require- 
ments in the same way as is done for privacy unaware location- 
based queries, i.e., using existing moving object indexing and query- 
ing techniques. Only then it filters out users by inspecting their 
location privacy policies. 

For example, if a user issues a query for other nearby service 
users, the service provider not only needs to find nearby users; it 
also needs to check the privacy policies of the users found to en- 
sure that they are willing to disclose their locations to the querying 
user. When potential query results are found solely according to 
spatial proximity, which is well supported by existing indexing and 
query processing techniques, very large and unnecessary interme- 
diate results may occur because the policy checking may eliminate 
most of the results. Section 3 further elaborates on the problem. 

This paper aims to provide an indexing technique and accom- 
panying query processing algorithms that enable the efficient pro- 
cessing of peer- wise privacy aware queries that serve as the founda- 
tion for typical location-based services. Our proposed approach is 
orthogonal to existing approaches to supporting provider-wise pri- 
vacy and can be integrated with these to achieve additional privacy. 

In particular, we propose the so-called Policy-Embedded B'^-tree 
(PEB-tree), which organizes objects based on both spatial proxim- 
ity and privacy policy compatibility. The main idea is to generate 
an indexing key value for each object that encodes location as well 
as policy information. This way, objects spatially near each other 
and with compatible privacy policies are assigned similar keys and 
are placed near each other in the index. The PEB-tree is based on 
the widely implemented B^-tree, which promises easy integration 
into existing commercial database systems. Based on the PEB-tree, 
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we provide algorithms for processing privacy-aware range and k 
nearest neighbor (/cNN) queries. 

The results of extensive empirical studies with the proposals sug- 
gest that the PEB-tree based algorithms outperform existing tech- 
niques considerably in terms of I/O cost. 

The rest of the paper is organized as follows. Section 2 reviews 
related work. Section 3 gives problem definitions, and Section 4 
describes the existing approach used for comparison. Section 5 
presents the proposed policy-embedded indexing techniques along 
with a cost analysis. Then Sections 6 and 7 cover cost modeling and 
empirical performance studies, respectively. Section 8 concludes 
and outlines future research directions. 

2. RELATED WORK 

As the PEB-tree integrates moving-object location and privacy, 
we first discuss research in moving-object database management 
and then location privacy. After that, we review works that share 
concepts that underlie our work. 

2.1 Indexing and Querying Moving Objects 

Previous Indexing Approaches 

Moving object indexing must contend with frequent updates. Thus, 
focus is often on the efficient support for workloads that contain 
queries as well very frequent updates, which contrasts earlier works 
on spatial indexing where the data was assumed to be relatively 
static and focus was on query performance. 

Most recent indexing proposals fall into one of three main cat- 
egories: (i) R-tree-based indexes, such as the RUM-tree [35], the 
TPR-tree [27], and the TPR*-tree [31]; (ii) B+-tree-based indexes, 
such as the B^-tree [13] and the B'^^^^-tree [32]; and (iii) quad- 
tree-based indexes, such as STRIPES [25]. A benchmark study [3] 
finds that the TPR-tree, the B^-tree, and STRIPES perform best 
under different workloads. However, these indexes focus on spatial 
proximity and offer no provisions for supporting privacy. 

Two recent indexing proposals [4, 17] take into account both lo- 
cation proximity and text similarity for finding the top-Zc most rel- 
evant spatial web objects. In particular, these leverage the inverted 
file for text similarity retrieval and the R-tree for spatial proximity 
querying. The PEB-tree also considers two aspects of the data it 
indexes, but it tackles a very different problem, privacy-concerned 
location-based queries. 

Following other research in moving object databases [13, 27, 31, 
32], we represent the position of a moving object as a linear func- 
tion from time to point locations in two-dimensional Euclidean 
space: lt{t) = it if' (t — tu), where it and it are the two- 
dimensional position and velocity of the object at time t^, and tu is 
the time of the most recent update. An object is thus given by the 
triple (it, lt,tu). 

An object issues a location update to the server when the devi- 
ation between its actual location and the predicted location based 
on its moving function exceeds a given threshold. Objects are re- 
quired to issue an update at least once within a maximum update 
time Atmu in order to keep the server informed about their exis- 
tence. 

We proceed to describe the B'^-tree that serves as the base struc- 
ture of the PEB-tree. 

The B^-Tree 

The B^-tree is an efficient and practical moving object index [3] 
that exploits the B+-tree, which renders it amenable to implemen- 
tation in real database systems. To exploit the B^-tree, the B'^-tree 
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Figure 1: Updates in the B^-Tree 



transforms the linear functions that capture object movements into 
single-dimensional values by means of a space-filling curve (e.g., 
the Z-curve) that is proximity preserving in the sense that points 
close to one another in 2-dimensional space tend to be close to one 
another in the transformed 1 -dimensional space. 

To differentiate locations inserted at different times, the B'^-tree 
partitions the time axis into intervals of duration Atmu/n, where 
Atmu is the maximum update interval and n is a chosen number of 
sub-partitions within Atmu • Each partition has a label timestamp 
as shown in Figure 1. An update that occurs during some time 
interval is performed as of the nearest future label timestamp tiab- 
This way, objects are assigned to different partitions of the time 
axis. 

An object O = (it, lt,tu) , is then indexed as of tiab = \tu + 
tmu/n]i, where \x]i denotes the nearest later label timestamp of 
X. The value that is indexed, the value, is the concatenation 
(0) of the binary values ([•]2) of two components: the index-parti- 
tion, computed from the label timestamp (Equation 2); and x.rep, 
computed from the object location as of the label timestamp (Equa- 
tion 3). 

B^ value{0,tu) = [index -partition]2 ® [x-rep]2 (1) 
index -partition = (tiab/ (Atmu /n) — 1) mod (n + 1)(2) 
x-rep = x-value{lt + it • (tiab — tu)) (3) 

For example, let the time axis be partitioned into intervals of dura- 
tion Atmti/2. Objects updated between time and Atmu I are in- 
dexed as of the time tiab — Atmu- The resulting index -partition 
is 1 or '01 ' in binary format. Next, is the location as of tiab con- 
verted to a single-dimensional value using a space-filling curve. 
The B^-tree inherits the B^ -tree's efficiency of insertions and dele- 
tions. 

To process range queries using the B'^-tree, the query ranges 
need to be transformed to account for data transformation. Specif- 
ically, query ranges need to be enlarged to ensure that all objects 
that may be in the results are found. 

Figure 2 shows an example where a solid rectangle R is the query 
range at time 6 and black points are the locations of objects A, 
B, and C as of time 5. Objects A and B will be in R at time 
6 according to their velocity vectors. To ensure that all objects are 
found, R is expanded to R' using the maximum object speeds along 
the two axes. For example, since the maximum downward speed is 
2, the distance between the upper border of R and R' is obtained by 
multiplying this speed by the time difference, i.e., 2 x (6 — 5) = 2. 
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Figure 2: Range Query in the B^-Tree 



The enlargement guarantees that objects that may be in the result 
are found. 

The enlarged query range is then converted into intervals of con- 
secutive space-filling curve values. As a result, a sequence of range 
queries are issued to the B'^-tree. The objects found are then checked 
for inclusion in a refinement step by using their actual locations at 
the query time. 

The B^-tree can also process predictive k nearest neighbor (/cNN) 
queries. To do that, a range query based on an estimated /cNN dis- 
tance is issued first; then the range is enlarged gradually until k 
nearest neighbors are found. 

The PEB-tree augments the B'^-tree with privacy policy infor- 
mation and hence has novel policy encoding and index key genera- 
tion algorithms. Moreover, the PEB-tree's query algorithm is more 
complex than that of the B'^-tree because both privacy and location 
proximity need to be considered simultaneously. 

2.2 Location Anonymization 

In provider-wise privacy, the service provider is typically pre- 
vented from knowing a user's exact location by using one or more 
of the following techniques: /c-anonymization [29], spatial-temporal 
cloaking, and encryption. Gruteser et al. [10] are the first to ap- 
ply /c-anonymization to preserving location privacy and propose a 
spatial- temporal cloaking approach: For each user, a trusted third 
party (agent) generates a cloaking region in which at least k — 1 
other users are also present. The service provider receives regions 
instead of exact locations of users, and hence the service provider 
cannot distinguish a user from other users in the same region. Var- 
ious extensions [1, 8, 15, 19, 21] aim to improve service flexibility 
and quality. A key limitation in these techniques is the performance 
bottleneck caused by the single anonymization agent. Further, the 
single agent can become a new target of attacks by malicious par- 
ties. 

Next, several encryption-based location anonymization approac- 
hes [9, 14, 26] have been proposed. The most representative one 
is by Ghinita et al. [9], who employ Private Information Retrieval 
(PIR) to prevent service providers from knowing a user's location 
while providing a high quality of service. 

Another thread of efforts [6, 12, 16] aims to perform location 
anonymization at the user side. However, this approach requires 
the users' devices to perform substantial computations and require 
extensive user involvement. 

Despite extensive efforts on preserving provider-wise privacy, 
little work has appeared on peer-wise privacy protection. In Sec- 
tion 4, we cover two naive approaches to the indexing of moving 
objects with peer- wise privacy protection. 



2.3 Additional Related Techniques 

Works on spatial-keyword querying (e.g., [4, 17]) may seem sim- 
ilar to our work since they also build an index for two aspects: lo- 
cation proximity and text similarity. However, text similarity and 
privacy policy compatibility are very different. In addition, we con- 
sider moving objects, while spatial keyword querying indexes con- 
sider stationary. 

We use a simple format for location privacy policy specification, 
which, however, contains the common major components of exist- 
ing location privacy policy specifications [11,23,28]. Last, it is 
worth noting that location privacy policies are different from the 
concept of location-based access control, such as GEO-RBAC [5], 
in the sense that location data plays different roles. In location- 
based access control, location data serves as a condition that needs 
to be verified before a user is granted a permission to particular re- 
sources (e.g., classified documents), while location data is the data 
to be protected by location privacy policies. 

3. PROBLEM DEFINITION 

As mentioned, we represent the position of a moving user as 
a linear function from time to point locations in two-dimensional 
Euclidean space. The model enables the answering of queries on 
near future positions if needed, and the parameters needed for the 
use of this model are readily available from GPS receivers. 

Next, we assume users will predefine their location privacy poli- 
cies and that the server has access to all users' privacy policies. We 
define a succinct yet expressive format for Location-Privacy Poli- 
cies (LPP for short) as follows. 

Definition 1. Let ui and U2 be two users. Let Pi^2 denote 
a location privacy policy assigned by ui for U2. Pi ^2 consists of 
three components {role, locr, tint) given as follows. 

- role: the relationship between ui and U2, such as 'fam- 
ily-member," "friend," or "colleague." 

- locr: a spatial region. 

- tint-' a subset of the time domain. 

A policy Pi^2 — {role, loCr,tint) states that ifu2 is related to ui 
by relationship role then U2 is allowed to see ui 's location when 
ui is located in locr during tint. 

For example. Bob lets his colleagues see his location when he is 
in town (e.g., Chicago) during work hours (e.g., 8 a.m. to 5 p.m.). 
The corresponding LPP is: P = {colleague, Chicago, [8 a.m., 
5 p.m.]). This way, access to Bob's location by users who are iden- 
tified as colleagues by Bob is regulated by P. The use of the con- 
cept of role is inspired by Role-based Access Control [7], which 
avoids writing the same policy for multiple people with the same 
relationship to Bob. 

The specific design of the privacy policy format is orthogonal to 
the paper's contribution, which supports a range of spatio-temporal 
policy formats. 

We support privacy-aware counterparts of the two arguably most 
fundamental query types, namely range queries and k nearest neigh- 
bor queries. The formal definitions are given next. 

Definition 2. (PRQ) The privacy-aware range query is de- 
fined as PRQ = {qlD, R,tq), where qlD is the query issuer's 
identity, R — ([x{,Xi], [x2,X2]) ('1' denotes lower bound' and 
'u' denotes 'upper bound'), and tq is the query time. The query 
retrieves all users who satisfy the following two conditions: (1) the 
user's location {x,y) at time tq falls within the query rectangle 
R; (2) the user has a location privacy policy {role, I OCr , tint)} i^ 
which qlD G role, {x,y) G locr, andtq G tint- 
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In Definition 2, the condition qlD G role checks if the relation- 
ship between the query issuer and the user is defined in the user's 
location privacy policy. 

Definition 3. (PkNN) The privacy-aware k nearest neighbor 
query is defined as PkNN — {qlD, qLoc, k,tq), where qlD is the 
query issuer's identity, qLoc and k nearest neighbor query param- 
eters, and tq is the query time. The query retrieves k users in U for 
which no other users are nearer to the query issuer's location qLoc 
at query time tq, where U is the set of all (m > k) users who have 
a location privacy policy {role, locr^tint), where qlD G role, the 
user's location at time tq belongs to locr, and tq G tint- 

To illustrate the problem that we tackle, we use the running ex- 
ample shown in Figure 3. The black point denotes a user with ID 
ui who wants to find her nearest friend. The star symbols represent 
ui's friends, whose IDs are ui2, U30, uioo, and i^iso- White 
circles represent other users. User i^i's friends may have differ- 
ent location privacy policies. Suppose that at the time ui issues 
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Figure 3: Running Example 

a privacy-aware nearest neighbor query, only one friend, i.e., U12 
(highlighted by the solid star symbol), is willing to disclose their 
location to ui. The query result is then {1^12}. 

4. SPATIAL INDEXING APPROACH 

An existing approach [19] applies filtering to the result obtained 
from using a spatial index. In particular, the service provider pro- 
cesses the privacy-aware queries as were they normal spatial queries 
and then evaluates the privacy policies on the returned results. With 
this approach, many non-qualifying preliminary results may be re- 
trieved from the spatial index. 

A possible spatial index for the example scenario is given in Fig- 
ure 4. Here, users are arranged purely based on their spatial prox- 
imity. For instance, ui and uioo are stored together as they are 
close to one another. 

To answer the privacy-aware nearest neighbor query from before, 
the service provider first locates ui's nearest neighbor uioo and 
then evaluates moo's privacy policy with respect to ui. Since i^ioo 
does not allow i^i to see his location at the query time, the service 
provider has to look for other nearby users. The query then needs to 
examine the next nearest neighbor, and this must be repeated until 
the final answer ^12 is found. In the example, at least four index 
nodes are accessed. 
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Figure 4: Spatial Index Example 



5. POLICY-EMBEDDED -TREE 

To efficiently support privacy-aware queries, we propose a three- 
step approach. First, we develop a generic policy encoding tech- 
nique that captures the compatibilities among location privacy poli- 
cies belonging to different users. The encoded values are called se- 
quence values. Second, we construct the Policy-Embedded B'^-tree 
(PEB-tree) that indexes mobile users according to both spatial and 
privacy policy proximity by carefully integrating sequence values 
with location mapping values. Third, we propose efficient algo- 
rithms for the queries defined in Section 3. 

5.1 Location Privacy Policy Encoding 

The actual policy encoding is preceded by policy translation and 
policy comparison phases. 

In policy translation, the semantic locations defined in an LPP 
are mapped to Euclidean regions. In policy comparison, we use a 
score a G [0, 1] to quantify the relationships between two users ui 
and U2. If no location privacy policy is defined between ui and 
U2, a = 0; otherwise, a is determined by the size of the region 
and the duration of the time interval during which the two users al- 
low each other to access their location information. If two policies 
are incompatible, a = 0. As before, let Pi^2 denote i^i's policy 
regarding U2. We consider two cases. 

• Pi^2 P2^i - ui and U2 are willing to simultaneously dis- 
close their locations to each other under certain conditions. 
Thus, overlaps exist between the locr and tint in the two 
policies. Let 0{loCr^ , /ocrs) denote the area of the overlap 
between the two regions and let D(tint^^tint2) denote the 
duration of the overlap between the time intervals in the two 
policies. We define a for this case as follows, where the area 
S of the space domain and the duration T of the time domain 
are used for normalization. 

_ 0{l0Cr^ , loCr^) D(tint^ , t^nt2 ) 

S ' T 

• Pi^2 ^ P2^i- ui and U2 will not simultaneously disclose 
their locations to one another. In this case, at least one of 
locr and tint in the two policies do not intersect. The corre- 
sponding a, which never exceeds 0.5, is defined as follows. 

_ 1 . \l0Cr^ I \tinti\ \l0Cr2\ \tint2 \ x 

2^ S ' T S ' T ^ 

The above function is also applied to the situation where 
only one user has a policy regarding the other. For exam- 
ple, if P2^i does not exist, the second term in the definition 
is omitted. 
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It is worth noting that the above equations can be extended to 
cover the case where multiple policies exist between two users. 
Also, other policy comparison approaches may be adopted to com- 
pute OL values. 

Based on the obtained a, we define the degree of compatibility 
between two users' policies, denoted as C(u\^u^. 

r 1(1 + a) Pi^2^P2^i 

C(l/i,2/2) = < OL Pl^2^P2^1 (4) 

[ a = 

The compatibility function •) returns a value in [0, 1]. The 
value is always greater than 0.5 for the first case, and it never ex- 
ceeds 0.5 for the second case. The goal is to give higher priority 
to users who can sometimes see each other simultaneously than 
to users who always disclose their locations to one another under 
disjoint conditions. This is because two users belonging to the first 
case are more likely to be included in each other's query results. We 
call users with non-zero compatibility values related users. Other- 
wise, they are called unrelated users. 

The next step is to determine the order of the sequence value 
assignment. We sort users in descending order of the number of 
their related users. This order gives higher priority to larger groups 
of users so as to preserve more relationships among users. 

From the sorted list, we assign the first user, i^i , a sequence value 
SV{u\) = sv (sv > 1). Each user uj related to ui obtains a se- 
quence value SV{uj) = SV{ui) -\- {1 — C{ui,Uj)). This scheme 
gives close sequence values to users with high compatibility values. 

In what follows, only users who do not have a sequence value 
are considered. In particular, we select from the sorted list the next 
user U2 and assign it a sequence value SV{u2) = SV{ui) + S, 
where 6 > 1. Parameter 5 is an interval that helps separate different 
groups of users as well as leaves adjustment space for future policy 
updates. Then, each user Uk related to U2 obtains a value SV{uk) = 
SV{u2) + (1 — C(u2^Uk)). This process continues until all users 
have sequence values. Policy updates are usually infrequent, and 
hence policy encoding is conducted largely off-line and does not 
add overhead at runtime. 

Figure 5 outlines the algorithm of the sequence value assign- 
ment. First (lines 1-5), for each user m in U, we put the related 
users (e.g., compatibility value C is larger than 0) in the group 



Algorithm Sequence_Value_Assignment 

Output: assignment result SV 

1. fori ^ 1 to |[/| do 

// U is the list of all of users; U[i\ = ui 

2. G{u^) ^ 0; SV(u^) ^± 

3. for j ^ Ito do 

4. \fC{ui, Uj) > then G{ui) ^ G{ui) \J{uj} 

5. Ui ^Sort(t/, |G|,desc) 

// list Ui contain users in descending order of |G|; C/z [z] = ui 

6. SV{ui)^sv 

7. for /c ^ 1 to n 

8. if S'y(nfe) =_Lthen 

9. SV{uk)^ SV{uk-i) + 5 

10. for each Uj in G(uk) do 

11. \fSV{uj) =_Lthen 

12. SV(uj) ^ SV{uk) + (1 - C{uk,Uj)) 

13. return 51/ 



Figure 5: Sequence Value Assignment 



G{ui). Then we sort the users in descending order of their group 
sizes and let Ui be the z'th element of this list. After that, we start 
assigning sequence values for each user (lines 6-12). If a user Uk 
has not been assigned a sequence value, the user obtains a sequence 
value that is 6 larger than that of its predecessor. Next, we assign 
sequence values to all the group members of user Uk. Each group 
member without a sequence value obtains a sequence value equal 
to the sum of user Uk 's sequence value and the compatibility score 
with user Uk . 

To illustrate the algorithm, we step through an example. Let 6 
users ui, U2, uq be given. Let their compatibility values be: 

C{U2,UI) = 0.4, C{UA,UI) = 0.9, C{U4,U3) = 0.8, C{U5,U3) 

= 0.2, C{uq,U3) = 0.6. According to the number of related users, 
we obtain this sorted list: {us ,ui,U4,U2,U5,uq). Let the initial 
sequence value be 2 and also let = 2. We first assign us sequence 
value 2. Its related users U4,U5, and uq obtain the sequence values 
2.2, 2.8, and 2.4, respectively. The next unassigned user is ui 
whose sequence value is set as follows: CV{ui) = SV{us) + 
S = 2 -\- 2 = 4. User U2 is currently unassigned and is related to 
ui. Thus, SV{u2) = 4 + (1 - 0.4) = 4.6. This completes the 
assignment. 

5.2 Structure of the PEB-Tree 

The PEB-tree is based on the B^-tree [13], which in turn is based 
on the B+-tree. This arrangement aims to make the PEB-tree easy 
to implement in real database management systems that invariably 
support B^ -trees. 

A leaf node in the PEB-tree has the following format: 

{PEB_key, UID,x,y,Vx,Vy,t, Pntp), 

where PEB_key is the index key, UID is the user ID, {x, y) and 
{vx,Vy) record the user's location and velocity at time t, and Pntp 
links to the user's privacy policy set and other user-specific infor- 
mation. The internal nodes of the PEB-tree serve as a directory that 
contains index key values and pointers to child nodes. 

The critical issue in building the PEB-tree is the generation of 
the PEB^key values for the users. A PEBJzey consists of three 
components :(i) TID, which indicates the time partition in the PEB- 
tree in which a user's information is stored; (ii) ZV, which is the 
Z-curve [22] value of a user's location as of the time of the time 
partition TID\ and (iii) SV , which is the policy encoding detailed 
in Section 5.1. The first two components are computed in a similar 
way as in the B^-tree [13]. After we obtain the three components, 
we combine them as follows to form the PEB^key. 

PEB.key = [TID]2 [SV]2 [ZV]2 (5) 

Here, [x]2 again denotes the binary value of x and denotes con- 
catenation. The construction of the PEB.key gives higher priority 
to sequence values than to location mapping values. This design is 
attractive because users related to the query issuer are usually much 
fewer than the unrelated users within the vicinity of a query. Us- 
ing the PEB.key, users who have policies related to one another 
will tend to be stored close to each other, which reduces the cost of 
processing privacy-aware queries. 

The algorithms for insertion and deletion of objects in the PEB- 
tree are similar to those for the B^-tree. Each insertion or deletion 
requires only a single-path travel of the index, and the PEB-tree has 
similarly efficient update performance as the B+-tree. 

Figure 6 shows an expected PEB-tree that corresponds to the 
example from Section 4. The figure suggests that the PEB-tree 
arranges objects so that queries need fewer node accesses. 
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Figure 6: PEB-Tree Example 



5.3 Privacy- Aware Range Query 

The privacy- aware range query (PRQ, defined in Section 3) aims 
to find users who satisfy not only spatial constraints, but also policy 
constraints. To answer such a query, we first determine the search 
ranges for the two constraints separately and then combine them 
to form ranges that can be represented by PEB^key values. The 
query algorithm consists of four steps. 

The first step finds all users in the query range. Let Uioc de- 
note the set of such users. The basic idea is similar to the range 
query in the -tree [13]. Specifically, in each time partition TID, 
the query range R is enlarged to cover users who are not in R as 
of the time that they are indexed, but that may be in R as of the 
query time. Then the enlarged query is converted into a set of 
one-dimensional intervals that are the search ranges of consecu- 
tive ZV values. Let there be k such intervals, given as follows: 
{[ZVs,;ZVeA,---[ZVs,;ZVe,]}. 

The second step finds the set of users (called Upoi) who may 
allow the query issuer to see their locations at the query time. For 
this purpose, we maintain a list for each user that stores the SV 
values of users who have policies with respect to the list owner. 
Such lists are updated only rarely, e.g., when a user is blocked by a 
previous friend or when a user adds a new friend. We arrange the 
users with policies with respect to the list owner in an ascending 
order of their SV values and denote the minimum and maximum 
SV values by SVmin and SVmax, respectively. 

The third step computes the PEB^key range corresponding to 
the intersection of Uioc and Upoi as obtained from the previous 
steps. We first combine the starting and ending points of the ZV 
ranges with each SV value, which yields these search ranges: 

[SVrmn ZVs^ ; SVrmn J, 
\SVmin ZVs2 5 ^^min -^^^2]' 

[SvLax ZVs,,;SVmax ZVe^]. 

Then we convert theses into intervals of consecutive PEB.key val- 
ues by adding the TID of the time partition under consideration. 

The PEB^key ranges are used to retrieve the query results in 
the PEB-tree. During the search, once a candidate user is found, 
the remaining search intervals formed by this user's SV value are 
skipped. Each candidate user's actual locations and policies are 
evaluated. If a user is verified to be the final result, all the remaining 
search intervals involving this user's SV value are skipped. 

Figure 7 summarizes the main steps of the range query algo- 
rithm. At the beginning, we find the minimum and maximum se- 
quence values in the query issuer's friend list. We start considering 
the first time partition in the PEB-tree by setting next-timestamp 
to 0. For each time partition, we enlarge the original query range 
using the Enlarge() function. The obtained enlarged query win- 



Algorithm PRQ {q, tq, uid, friend Jist) 
Input: R is the query range and tq is the query time 
uid is the ID of the user who issues the query 
friend Jist is the list of users related to uid 
Output: result Jist 



1. SVmin ^ smallest sequence value in friend Jist 

2. SVmax ^ largest sequence value in friend Jist 

3. nextJimestamp ^ 

4. more ^ true 

5. while more 

6. R' ^ Enlarge(next Jimestamp, R, tq) 

7. ZVJntervals ^ ZVconvert(i^') 

8. for each {ZVstart] ZVend) in ZVJntervals 

9. StartPnt ^ TID SVmin ZVstart 

10. EndPnt ^ TID SVmax ZVend 

11. current Jeaf ^ leaf node containing StartPnt 

12. for each user u in current Jeaf do 

13. ifu passes location and policy evaluation then 

14. add u to result Jist 

15. if current Jeaf contains EndPnt then 

16. nextJimestamp ^ nextJimestamp + 1 

17. else 

18. current Jeaf ^current Jeaf .righi^sibling 

19. if next Jime stamp > n V current Jeaf =J_ then 

20. more ^ false 
2 Lend while 



22.return result Jist 



Figure 7: Algorithm for the Privacy- Aware Range Query 



dow R' is converted into a set of 1 -dimensional intervals by ZV- 
convertO according to the Z-curve mapping. By concatenating the 
TIDs (computed from nextJimestamp), the sequence values, and 
the ZV values, we obtain the search range for the PEB.key val- 
ues which is [StartPnt ; EndPnt] . Then we locate the leaf node 
current Jeaf that contains the starting point of the search inter- 
val, and we keep retrieving the right sibling nodes until the end of 
the search interval. The search stops after all n time partitions are 
checked. 

Since the calculation of PEB.key values uses interleaving algo- 
rithms, it is possible that the PEBJzey intervals computed above 
overlap with one another. To avoid duplicate search, the PEB^key 
intervals are refined into a set of non-overlapping intervals that are 
then used for search in the PEB-tree. 

We proceed to compute search ranges for the example in Fig- 
ure 3. Assume that the dashed rectangle is the range querying for 
user ui to find his nearby friends, where the query range R — 
([2, 2], [4, 6]). Suppose that the SV values of ui and the friends 
are the following: SV{ui) = 46, SV{ui2) = 50, SV{u3o) = 25, 
SV{u59) = 89, SV{uioo) = 55, SV{ui3o) = 80. For simplic- 
ity, we assume that the space is 8x8. Then R is converted into 
two one-dimensional intervals according to the Z-curve mapping: 
[13; 16] and [25; 28]. Combining SV and ZV, we obtain 10 search 
ranges for each TID. The following are the ranges for TID = 0: 

• [TID SV{u3o) ZVs, ; TID SV{u3o) ZVe,] 
= [0 25 13; 25 16] = [1613, 1616] 

• [TID SV{u3o) ZVs^; TID SV{u3o) ZVe^] 
= [0 25 25; 25 28] = [1625, 1628] 

• [TID SV(ui2) ZVs^] TID SV(ui2) ZVe^] 
= [0 50 13; 50 16] = [3213, 3216] 
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• [TID e SV{ui2) e ZVs^ ; TID SV{ui2) ZVe^] 
= [0 50 25; 50 28] = [3225, 3228] 

• [TID SV{u^9) ZVs^] TID SV{u^<,) ZVeJ 
= [0 89 25; 89 28] = [5725, 5728] 

During the search of these ranges, once a user is found in the first 
spatial range [13; 16], the second range will be skipped since a user 
has only one location. 

5.4 Privacy-Aware /cNN Query 

The algorithm for the privacy- aware /cNN (P/cNN, defined in 
Section 3) query is derived from the B"^ -tree's privacy-unaware 
/cNN query algorithm [13], which is answered by iteratively per- 
forming range queries with an incrementally enlarged search region 
until k answers are obtained. First, a range Rqi centered at q and 
with radius Vq = Dk/k is constructed, where Dk is the estimated 
distance between the query issuer and its /c'th nearest neighbor; Dk 
can be estimated by the following equation, where N is the total 
number of users [33]: 

Since a user location that is inserted at a certain time is stored 
in the index as of a future label timestamp, Rqi is enlarged to Rqi 
similarly to what we did for range queries to cover all users who 
may be in Rqi as of the query time. If at least k users are cur- 
rently covered by the inscribed circle of Rqi at time tg, the /cNN 
algorithm returns k users and stops. 

Otherwise, Rqi is extended by rq to obtain Rq2 and the corre- 
sponding enlarged window Rq2. This time, the region Rq2 — Rqi 
is searched. The process is repeated until k users are found within 
the inscribed circle of the current range. During the search, the 
corresponding two-dimensional ranges are converted into a set of 
intervals in the transformed, one-dimensional space. 

To answer the P/cNN query, we need to consider the search ranges 
of both the ZV and the SV values for each time partition TID. 
The ZV ranges determine the locations of the users who are close 
to the query issuer, which can be obtained by the general approach 
already covered, but with the following modification. For each 
query range, we consider only the one interval formed by the min- 
imum and maximum 1 -dimensional values of the query range. 

The reason for this difference is the following. The P/cNN query 
requires multiple rounds of range queries, and the corresponding 1- 
dimensional query intervals obtained from different rounds of en- 
largement may intersect. When we actually search those intervals 
in the index, it is possible that multiple query intervals are located 
in the same leaf node. 

To avoid complex interval calculations and repeated leaf node 
accesses, we use a single query interval for each range query. Sup- 
pose that n rounds of enlargement occur. For round i (i.e., Rqi), 
we denote the starting and ending points of the set of correspond- 
ing one-dimensional search intervals by ZVs^ and ZVe^, respec- 
tively. The ranges of n rounds are given by: {[^V^si ; ZVe^], . . . , 
[ZVs„;ZVe„]}. 

The SV ranges retrieve users who may be willing to disclose 
their locations to the query issuer. A smaller SV value indicates 
that the corresponding user is more likely to disclose their loca- 
tion to the query issuer. Suppose that m users are willing to let 
the query issuer see their locations under some conditions. By ar- 
ranging these m users in increasing order of their sequence val- 
ues, we have the following list: [SV{ui), SV{u2), SV{um)], 



~[sv(u,mzv,,;zv,,] sv(uimzv,2,z%2] -sv(u,mzvsn;z%,] 
[sv(u2mzv,{,zvj sv(u2mzv,2^zvj -sv(i^mzVsn'^zvj 

_[sv(ujmvsi',zvj 5nHn)e[2K2;z%2]-5nMje[ZKn;zyen]J^,n 

Figure 8: Search Matrix 

where SV{ui) is the sequence value of user m. 

Figure 8 shows the complete search space (represented as a ma- 
trix) in one time partition for a given P/cNN query. The actual 
search is based on the values of the PEB.key computed from the 
ZV and SV ranges in each element of the matrix together with the 
TID of the corresponding time partition. 

The next step is to find a good search order to obtain the query re- 
sult as soon as possible. Observe that ranges close to the upper-left 
corner of the matrix have shorter spatial distances to or closer SV 
value differences from the query issuer. Therefore, those ranges 
are more likely to contain the final query results. Therefore, we 
apply a triangular search order as illustrated in Figure 9, where the 
arrows and numbers in the brackets define the search order. Fol- 
lowing this order, the ZV and SV values are changed alternatively 
until k candidates are found. 




[ ] 
[ ] 



Figure 9: Triangular Search Order 

Having found k candidates, we check the remaining ranges in 
the last visited column in the search matrix, i.e., a vertical scan is 
done for the last visited column. For this vertical scan, the intervals 
of the ZV values in the remaining ranges are shortened according 
to the distance to the latest /c'th nearest candidate. 

For example, if k candidates are found after examining the users 
falling in the range indicated by the circle 5 (in the second column 
in Figure 9), we continue to consider the remaining ranges in the 
second column, which are: [SV{u2) [ZVs2] ZVe2], • • • , SVm 
[ZVs2] ZVe2\- The interval of the ZV values is a subset of [ZVs2] 

ZVe2]. 

In particular, the new interval corresponds to the query square 
with the query issuer at the center and twice the distance to the 
/c'th nearest candidate as its side length. This last step is needed in 
order to determine whether there are other users who have not been 
examined, but are closer to the query issuer than the /c'th nearest 
neighbor found so far. 

Figure 10 outlines the algorithm of the P/cNN queries. First, we 
compute the estimated distance between the query issuer and the 
/c'th nearest neighbor, based on which we obtain the initial query 
radius. The search starts from the first time partition in the PEB- 
tree, i.e., next -timestamp = 0. In each time partition, the Get- 
RangeO function constructs the search range which is a square cen- 
tered at (x, y) with length equal to 2rq. According to the search or- 
der adopted, the Next_friend() and Next_radius() functions compute 
the corresponding SV value and radius of the query range, respec- 
tively. Theses parameters are then supplied to the PRQ query mod- 
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Algorithm PKNN(x, y, tq, k, uid, friend Jist) 
Input: {x, y) is user uid'^ location 

k is the required number of neighbors 

tq is the query time 

friend Jist is the hst of users related to uid 
Output: result Jist 

1. Dk ^ 2/sqrt(3.14) x (1- sqrt(l- sqrt(k/N))) 

2. rq ^ Dk/k 

3. next-timestamp ^ 

4. more ^ true 

5. while more 

6. R ^ GetRange((a:, y),rq) 

7. fid ^ Next Jntndifriend Jist) 

8. neighbor ^PRQ(i?, tg, uid, fid) 
result Jist ^ Add-to j:QS\i\t(neighbor) 
if k neighbors are found 

fid ^ Rest Jriendifriend Jist) 
R ^ GetRange((x, y), kdist) 



9. 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 



neighbor ^ PRQ(R,tq, uid, fid) 
result Jist Add-to Jcesu\t(neighbor) 
more ^ false 
rq ^ Next_radius() 
return result Jist 



Figure 10: Algorithm for the P/cNN Query 



ule (presented in the previous section) to retrieve candidate query 
results. 

The Add_to_result() function will verify the actual locations and 
policy constraints of the obtained results. If k neighbors are found, 
the query range will be refined based on the distance between the 
query issuer and the /c'th nearest neighbor found so far, and the 
range of the sequence value is refined by the Rest_friend() function 
that returns the list of SV values in the last visited column in the 
search matrix. After refinement, another PRQ query is invoked 
to obtain the final query result. In case less than k neighbors are 
found, the query radius is enlarged to start a new round of search. 

6. QUERY I/O COST MODELING 

In this section, we model the I/O cost of querying with the PEB- 
tree. We consider the privacy-aware range query as it is the most 
fundamental query. 

The cost function we develop is based on the following assump- 
tions on the datasets. To simulate different relationships among 
users, we first randomly divide users into groups and then gener- 
ate policies for each user based on a parameter called the grouping 
factor (0) and defined as ^ = where Ngr is the number of 
policies that a user has regarding other users in the same group, 
and where Np is the user's total number of policies. The group- 
ing factor ranges from to 1. When the factor is 1, each user only 
has policies with users in the same group, and no policies connect 
users in different groups. When the factor is 0, there is no group, 
and each user may have policies with respect to any user in the 
system. 

Our approach is to identify important parameters that signifi- 
cantly affect query performance and then integrate their effects into 
a cost function. Recall that the index keys in the PEB-tree are gen- 
erated by incorporating the effects of policy compatibility and lo- 
cation proximity. Moreover, the policy compatibility is represented 



as a sequence value to which the location encoding is appended. 
As a result, the sequence value becomes the dominant factor dur- 
ing querying, while the location encoding provides only supple- 
mentary pruning. Thus, the cost function focuses on modeling the 
effect of the sequence value assignment on the query performance. 
An empirical validation (in Section 7.10) offers evidence that the 
approach yields a quite accurate cost function. 

The sequence value assignment is determined by the grouping 
factor 0, the number of policies per user (denoted as Np), and the 
total number of users (denoted as N). When 6 = 1, the PEB- 
tree achieves the best performance. This is because when users are 
well grouped, query results are constrained to users that are stored 
together. The query cost increases when decreases. The worst- 
case scenario occurs when each user is allowed to have a policy 
with any other user in the system, i.e., ^ = 0. In this case, the 
sequence values fail to group users, as there are no groups. The I/O 
cost of a query is upper-bounded by the number of users related 
to the query issuer when each of the related users is stored in a 
different leaf node. 

The above effect is modeled by the cost function Ci in Equa- 
tion 6, where Ni is the total number of leaf nodes in the index; Np 
is the query cost in the above-mentioned worst-case scenario; and 
Np estimates the benefit obtained from grouping and captured by 
the grouping factor. The term 1 captures the minimum query cost 
when the query result is stored in one leaf node. 
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(6) 



+ Np-N^ Np<Ni 
+ iVz - < Np > Ni 

In summary, Ci estimates the number of nodes needed for storing 
users related to the query issuer by taking into account Np and 0. 

Next, we consider the effect of the parameter A^. A larger N 
leads to larger groups of users being connected through policies. 
Since the sequence value assignment is conducted group-by-group 
in descending order of the group size, the existence of many larger 
groups tends to increase the distance among the sequence values 
belonging to two related users. In other words, it increases the 
probability that users in the same query result are stored in different 
nodes, which in turn increases the query cost. 

The empirical studies covered in the next section show that the 
query cost increases linearly with N. Therefore, we model it as a 
linear function and integrate it into Ci as follows. 



C 
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+ {ai^ + a2){Np - N'p) Np < Ni 
+ (aif^ + a2){Ni - N^) Np > Ni 



(7) 



In Equation 7, L is the side length of the space and is then the 
density of the object space. Parameters ai and a2 are obtained by 
taking as input any two sample points (i.e., the query cost C) from 
the experiments on the datasets with the same location distribution. 
For example, ai = 10 and a2 = 0.3, for data sets with uniform 
location distribution. 

Using the cost function, we are interested in understanding the 
extents of the ranges of settings within which the PEB-tree is com- 
petitive. Specifically, we find that the PEB-tree performs worse 
than the spatial index approach described in Section 4 when each 
user is related to more than about 5% of all users. Considering 
a data set with lOOK users, 5% is 5,000, which is already a large 
number of friends for a person. 

Such a worst-case scenario may not occur in reality, as little pri- 
vacy is actually achieved in such scenario. If all users are related 
to each other, every user grants some access to everyone else in the 
system. We believe that the general settings used in the empirical 
studies covered next, in which users tend to show certain privacy 
preference to a group of users, make more sense. 
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7. EMPIRICAL PERFORMANCE STUDY 

Following a description of the settings of the study, we cover 
the offline cost of the initial index building. Then follows query 
performance studies where a range of workload settings are varied. 
We end with a cost model validation study. 

7.1 Experimental Settings 

We compare the performance of the PEB-tree with the approach 
of using spatial index as introduced in Section 4. Specifically, 
we select the B'^-tree [13] as the spatial index, and we adopt the 
commonly used filtering approach to handle peer- wise privacy con- 
cerns. Since the PEB-tree is based on the B^-tree and the spatial 
indexing approach is based on the B'^-tree, the same settings from 
the literature [13], such as the number of tree partitions and the 
maximum update interval, are used for the two approaches. 

We use two types of synthetic data sets of user positions, namely 
uniformly distributed positions and positions distributed in a spa- 
tial network, both in a space domain with area 1000 x 1000. In 
the uniform datasets, user positions are chosen randomly, and they 
move in randomly chosen directions and at speeds ranging from 
to 3. One may think of the unit of space as being kilometers and 
the unit of speed as being kilometers per minute. 

The network-based data sets are generated using an existing data 
generator [27], where users move in a network of two-way routes 
that connect a varying number of destinations. Objects start at ran- 
dom positions on routes and are assigned at random to one of three 
groups of objects with maximum speeds of 0.75, 1.5, and 3. When- 
ever an object reaches one of the destinations, it chooses the next 
target destination at random. Objects accelerate as they leave a des- 
tination, and they decelerate as they approach a destination. 

In all datasets, for each user, we generate a given number of ran- 
dom policies by varying the spatial ranges and time intervals with 
respect to a set of other users. The relationships among users are 
modeled using the grouping factor introduced in Section 6. Unless 
stated otherwise, the dataset contains 60,000 uniformly distributed 
users, and each has 50 policies with a grouping factor of 0.7. 

The default query window is quadratic with side length 200, and 
A; is 5 in the P/cNN query. The parameters used are summarized in 
Table 1, where values in bold denote default values used. 

The performance is evaluated in terms of I/O cost. The disk page 
size is set at 4K bytes, and a 50-page LRU buffer is simulated. 
We report only query performance as the two approaches achieve 
similarly good update performance. 



Table 1: Parameters and Their Settings 



Parameter 


Setting 


Buffer 


50 pages 


Number of users 


lOK, 20K, . . . , 60K, . . . , lOOK 


Maximum speed 


1,2, 3, 4,5,6 


Query window size 


100, 200, . . . , 1000 


k (/cNN query) 


1, ...,5, 10 


Grouping factor (0) 


(uniform), ...,0.7, 1.0 


Number of policies per user 


10, . . . , 50, . . . , 100 


Number of destinations 


uniform, 25, 50, 100, . . . , 500 



7.2 Preprocessing Time for Policy Encoding 

In the first round of experiments, we study the preprocessing 
time used for policy encoding. This one-time processing is done 
offline when users are first registered. 

Figure 11(a) shows the results when varying the total number of 
users from lOK to lOOK. The experiments were conducted on a PC 



with a 2. 5 3 GHz Intel Xeon CPU and 4 Gbytes of memory. The time 
increases linearly with the number of users. We also observe that 
the preprocessing is very efficient, as it takes only about 10 seconds 
to compare location privacy policies and generate sequence values 
for lOOK users. 
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Figure 11: Preprocessing Time 

We also consider the policy encoding time when varying the 
number of policies to be analyzed for each user from 10 to 100, 
with 60K users. As shown in Figure 11(b), the processing time in- 
creases with the number of policies, but is still low. The efficient 
preprocessing can be attributed to our algorithm that uses the addi- 
tion operation to directly generate sequence values related to a user 
instead of sorting compatibility degrees multiple times. 

7.3 Effect of Total Number of Users 

We proceed to evaluate the query performance of the PEB-tree 
and the spatial index approach. In this experiment, we vary the total 
number of users from lOK to lOOK, and we measure the average 
I/O cost of 200 queries. 

Figure 12(a) reports on privacy-aware range queries. We ob- 
serve that the PEB-tree yields much less I/O than the spatial in- 
dex. The performance gap increases with the data size. When the 
data size grows to lOOK, the PEB-tree is about 10 times better than 
the spatial index. This behavior can be explained as follows. The 
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Figure 12: Effect of Total Number of Users 

spatial index organizes users only based on their spatial proximity. 
Thus, the spatial index needs to retrieve all users inside the query 
range, regardless of whether or not they are allowed to be seen by 
the query issuer, which increases costs. The PEB-tree stores users 
based on both location and policy proximity, and search is narrowed 
by using both location and policy constraints; hence it achieves the 
better performance. 

Figure 12(b) shows the performance of P/cNN queries. Again, 
the PEB-tree significantly outperforms the spatial index approach. 
As for range queries, this demonstrates that the PEB-tree provides 
a better storage arrangement by considering both location and pol- 
icy proximity, which in turn reduces unnecessary accesses to non- 
qualifying users. 
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The triangular search order, which examines users in descending 
order of their probabihties to be included in the query result, also 
improves performance. In other words, users who are either close 
to the query issuer or are more likely to be visible to the query 
issuer are checked early, which directs the search towards users 
who qualify for the result and shortens the query processing. 

7.4 Effect of Number of Policies Per User 

In this experiment, we vary the number of policies per user from 
10 to 100. Without loss of generality, we assume that each user has 
only one location privacy policy with respect to a particular user. 

Figure 13(a) shows the performance of privacy-aware range quer- 
ies, from which we can see that the PEB-tree again outperforms the 
spatial index. Moreover, it is not surprising to observe an increase 
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Figure 13: Effect of Number of Policies per User 

of the query cost in the PEB-tree with the number of policies. The 
more policies, the more qualified users may be included in a query 
result, and therefore more nodes are accessed. We also observe that 
the performance of the spatial index is independent of the number 
of policies. This is because the spatial index considers only loca- 
tion proximity. Thus, queries with the same location constraint will 
cause the same number of candidate users to be retrieved. 

Figure 13(b) compares the P/cNN query performance of the two 
approaches. Observe that the PEB-tree saves significant I/O com- 
pared to the spatial index. The reason is similar to that discussed 
for the previous experiments. 

7.5 Effect of Grouping Factor 

Here, we investigate the effect of the grouping factor. As men- 
tioned earlier, when this factor is 0, each user can have policies with 
randomly selected users in the system. When it is 1, each user is 
only related to users in the same group. 

We first evaluate the range query performance. As shown in Fig- 
ure 14(a), we can see that the cost of the PEB-tree tends to de- 
crease as the grouping factor increases, whereas the spatial index 
maintains a constant performance. The experiment confirms the 
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Figure 14: Effect of the Grouping Factor 

expectation that larger grouping factors help the PEB-tree achieve 
more effective sequence value assignments that group related users 



better. As the grouping factor approaches 1, users tend to be di- 
vided into non-overlapping groups. In this case, users in the same 
group are likely stored in the same or in a few nearby leaf nodes in 
the PEB-tree, and therefore few I/Os are needed for queries. 

However, the grouping factor does not influence the query per- 
formance of the spatial index since it stores users purely based on 
their location proximity, which is not influenced by the grouping 
factor. 

Similar performance patterns are observed for P/cNN queries, as 
shown in Figure 14(b). The PEB-tree performs the best for the 
same reasons. 

7.6 Effect of Query Parameters 

We now evaluate the impact of the location-related query param- 
eters. For range queries, we measure the query cost by varying the 
query window side length from 100 to 1,000. For /cNN queries, we 
vary parameter k from 1 to 10. 

Figure 15(a) shows the PRQ performance. Again, the PEB-tree 
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Figure 15: Varying Query Parameters 

significantly and consistently outperforms the spatial index. More- 
over, the PEB-tree cost is almost constant, while the spatial in- 
dex cost increases as the query window increases. The PEB-tree 
achieves constant performance because no matter how large the 
query window is, the maximum number of users to be checked by 
the PEB-tree is bounded by the total number of users related to the 
query issuer. 

For the spatial index, the location-related query parameters play 
an important role. In particular, the larger the query window, the 
more nodes need to be accessed in the spatial index. 

Figure 15 (b) shows the P/cNN query performance of the two 
trees when varying k. The PEB-tree has stable performance for 
different values of k due to the reasons similar to those stated for 
the last experiment. This also indicates that the PEB-tree is rela- 
tively unaffected by the location-related parameters. In the case of 
the spatial index, increasing the value of k slightly degrades query 
performance since a larger k requires the spatial index to enlarge 
the search range to find the qualified objects. 

7.7 Effect of Spatial Distribution 

This round of experiments targets the effect of the location dis- 
tribution of the users. We observe the performance of range and 
nearest neighbor queries when using network-based data sets with 
the number of possible destinations (also called hubs) ranging from 
25 to 500. The fewer the destinations, the more spatially skewed 
the data is. 

Figure 16 shows that the PEB-tree achieves much better per- 
formance than the spatial index in all cases. The increase in the 
number of destinations only slightly affects the search ranges in 
the PEB-tree. This is because the location constraints are not the 
dominant factor during the index construction and hence has less 
influence on the query performance. The performance of the spa- 
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Figure 16: PEB-Tree vs. Spatial Index 

tial index approach fluctuates slightly when varying the number of 
possible destinations. 

7.8 Effect of Object Speed 

We are also interested in studying how the object speed affects 
the query performance of both approaches. We vary the maximum 
speed from 1 to 6, choosing object speeds in the range from to the 
maximum speed at random. As shown in Figure 17, the query cost 
of the spatial index increases slightly when objects move faster for 
both types of queries. This is because the query algorithm of spatial 
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Figure 17: Effect of Object Speed 



is because the two indexes share the same base structure, i.e., the 
B^-tree. The fluctuations are mainly caused by the amount of ob- 
jects belonging to different time partitions in the trees. 

7.10 Cost Function Evaluation 

We end by evaluating the accuracy of the cost function C de- 
veloped in Section 6. We compare the I/O cost as obtained from 
the cost function C with the actual I/O cost. The comparison is 
conducted by varying one of three parameters at a time: the total 
number of users, the number of policies per user, and the group- 
ing factor. We consider these three parameters because they are the 
main factors that affect the query performance of the PEB-tree, as 
shown in the previous experiments. The results are shown in Fig- 
ure 19. From all the figures, we can see that the estimated cost 
tracks the actual cost quite well. 
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Figure 19: Cost Function Evaluation 



index needs to enlarge the query window according to the maxi- 
mum object speed. The higher the speed, the larger the final search 
region becomes, yielding a higher cost. Compared to the spatial in- 
dex, the PEB-tree has relatively stable performance. Although the 
PEB-tree shares the query window enlargement problem with the 
spatial index approach, the location constraints used in the PEB- 
tree are dominated by the policy compatibility, which significantly 
reduces the effect of this location-related parameter. 

7.9 Effect of Updates 

To observe the effect of updates on query performance, we mea- 
sure the query costs if both approaches each time 25% of the data 
set has been updated. The experiments are conducted until the data 
set has been fully updated twice. The results, in Figure 18, show 
that the query cost of both approaches only fluctuates slightly. This 
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Figure 18: Effect of Updates 



8. CONCLUSIONS AND FUTURE WORK 

We consider the problem of efficiently supporting range and k 
nearest neighbor queries in a setting that affords moving users of 
location-based services peer-wise location privacy. Specifically, 
different peer users are allowed to see the location of a user when 
the user is within a specified spatio-temporal range. 

To support the resulting privacy-aware queries, we present a new 
indexing technique, called the PEB-tree, that leverages the B^-tree 
that is based on the B^-tree. This is enabled by a technique that en- 
codes both the location privacy compatibility and the spatial prox- 
imity among users in a one-dimensional value that is amenable to 
B^-tree indexing; thus, users who tend to be allowed to see each 
others' locations and who are spatially close tend to be stored to- 
gether on disk. Range and k nearest neighbor query algorithms are 
presented that exploit the PEB-tree to simultaneously filter candi- 
date users according to both privacy compatibility and spatial prox- 
imity. 

An empirical performance study compares the proposed tech- 
niques with an existing approach that uses simply a spatial index, 
and the study offers insight into the behavior of the proposed tech- 
niques for wide variety of workloads. The study shows that the 
proposals outperform the existing approach very substantially. 

Several directions for future research exist. It is relevant to con- 
sider multiple policies between two users for computing policy 
compatibility degree. Similarly, it is relevant to explore new en- 
coding and accompanying querying techniques. Moreover, it is of 
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interest to extend other types of location-based queries to take into 
account peer- wise privacy concerns. 

Acknowledgments 

C. S. Jensen was supported in part by Geocrowd, an Initial Training 
Network under FP7 - People Marie Curie Actions, funded by the 
European Commission. 

9. REFERENCES 

[1] B. Bamba, L. Liu, P. Pesti, and T. Wang. Supporting 

anonymous location queries in mobile environments with 

privacygrid. In Proc. WWW, pages 237-246, 2008. 
[2] S. Benford. Future location-based experiences. JISC 

Technology and Standards Watch, 17 pages, 2005. 
[3] S. Chen, C. S. Jensen, and D. Lin. A benchmark for 

evaluating moving object indexes. In Proc. VLDB, pages 

1574-1585, 2008. 
[4] G. Cong, C. S. Jensen, and D. Wu. Efficient retrieval of the 

top-k most relevant spatial web objects. In Proc. VLDB, 

pages 337-348, 2009. 
[5] E. Bertino, B. Catania, M. L. Damiani, and P. Perlasca. 

Geo-RBAC: A spatially aware RBAC. In Proc. ACM 

SACMAT, pages 29-37, 2005. 
[6] M. Duckham and L. Kulik. A formal model of obfuscation 

and negotiation for location privacy. In Proc. Pervasive, 

pages 152-170, 2005. 
[7] D. F. Ferraiolo and D. R. Kuhn. Role-based access control. 

In Proc. National Computer Security Conference, pages 

554-563, 1992. 

[8] B. Gedik and L. Liu. A customizable k-anonymity model for 

protecting location privacy. In Proc. IEEE ICDCS, pages 

620-629, 2005. 
[9] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and 

K.-L.Tan. Private queries in location based services: 

anonymizers are not necessary. In Proc. ACM SIGMOD, 

pages 121-132, 2008. 
[10] M. Gruteser and D. Grunwald. Anonymous usage of 

location-based services through spatial and temporal 

cloaking. In Proc. ACM MobiSys, pages 31-42, 2003. 
[11] C. A. Gunter, M. J. May, and S. G. Stubblebine. A formal 

privacy system and its application to location based services. 

Proc. Workshop on Privacy Enhancing Technologies, 

LNCS 3424, pages 256-282, 2005. 
[12] H. Hu and J. Xu. Non-exposure location anonymity. In Proc. 

lEEEICDE, pages 1120-1131, 2009. 
[13] C. S. Jensen, D. Lin, and B. C. Ooi. Query and update 

efficient b-i-tree based indexing of moving objects. In Proc. 

VLDB, pages 768-779, 2004. 
[14] A. Khoshgozaran and C. Shahabi. Blind evaluation of nearest 

neighbor queries using space transformation to preserve 

location privacy. In Proc. SSTD, pages 239-257, 2007. 
[15] H. Kido, Y. Yanagisawa, and T. Satoh. Protection of location 

privacy using dummies for location-based services. In Proc. 

lEEEICDE Workshops, page 1248 (5 pages), 2005. 
[16] M. Li, K. Sampigethaya, L. Huang, and R. Poovendran. 

Swing & swap: user-centric approaches towards maximizing 

location privacy. In Proc. ACM Workshop on Privacy in the 

Electronic Society, pages 19-28, 2006. 
[17] Z. Li, K. C. K. Lee, B. Zheng, W.-C. Lee, D. L. Lee, and 

X. Wang. IR-tree: An efficient index for geographic 

document search. IEEE TKDE, 23(4):585-599, 2011. 



[18] D. Lin, E. Bertino, R. Cheng, and S. Prabhakar. Location 
privacy in moving-object environments. Transactions on 
Data Privacy, 2(l):21-46, 2009. 

[19] M. F. Mokbel, C. Y. Chow, and W. G. Aref. The new casper: 
Query processing for location services without 
compromising privacy. In Proc. VLDB, pages 763-774, 
2006. 

[20] M. F. Mokbel. Privacy in location-based services: 

state-of-the-art and research directions. In Proc. IEEE MDM, 
page 228, 2007. 

[21] M. Monjur and S. 1. Ahamed. Towards a landmark influence 

framework to protect location privacy. In Proc. ACM SAC, 

pages 219-220, 2009. 
[22] B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz. 

Analysis of the clustering properties of the Hilbert 

space-filling curve. IEEE TKDE, 13(1): 124-141, 2001. 
[23] G. Myles, A. Friday, and N. Davies. Preserving privacy in 

environments with location-based applications. IEEE 

Pervasive Computing, 2(l):56-64, 2003. 
[24] Orange U. K. location tracking - dangers, http : / /wwwl . 

orange . CO . uk/ safety /mobile/2 41/2 44 . html. 
[25] J. M. Patel, Y. Chen, and V. P Chakka. Stripes: An efficient 

index for predicted trajectories. In Proc. ACM SIGMOD, 

pages 637-646, 2004. 
[26] K. Puttaswamy and B. Y. Zhao. Preserving privacy in 

location-based mobile social applications. In Proc. Workshop 

on Mobile Computing Systems and Applications, pages 1-6, 

2010. 

[27] S. Saltenis, C. S. Jensen, S. T. Leutenegger, and 

M. A. Lopez. Indexing the positions of continuously moving 

objects. In Proc. ACM SIGMOD, pages 331-342, 2000. 
[28] E. Snekkenes. Concepts for personal location privacy 

policies. In Proc. ACM Conference on Electronic Commerce, 

pages 48-57, 2001. 
[29] L. Sweeney. Achieving k-anonymity privacy protection using 

generalization and suppression. International Journal on 

Uncertainty, Fuzziness and Knowledge-based Systems, 

10(5):571-588, 2002. 
[30] J.C. Tanner. In search of LBS accountability. In 

telecomasia.net, 2008. http : / /vivivi . telecomasia . 

net/ content /search- lbs- account ability- 0. 
[31] Y. Tao, D. Papadias, and J. Sun. The TPR*-tree: an 

optimized spatio-temporal access method for predictive 

queries. In Proc. VLDB, pages 790-801, 2003. 
[32] Y. Tao and X. Xiao. The B'^^'^'^-tree: indexing moving 

objects by space filling curves in the dual space. VLDB 

Journal, 17(3):379-400, 2008. 
[33] Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An efficient 

cost model for optimization of nearest neighbor search in 

low and medium dimensional spaces. IEEE TKDE, 

16(10): 1169-1 184, 2004. 
[34] C. R. Vicente, D. Freni, C. Bettini, and C. S. Jensen. 

Location-related privacy in geo- social networks. IEEE 

Internet Computing, 15(3):20-27, 2011. 
[35] X. Xiong and W. G. Aref. R- trees with update memos. In 

Proc. lEEEICDE, page 22, 2006. 
[36] M. L. Yiu, C. S. Jensen, J. M0ller, and H. Lu. Design and 

analysis of a ranking approach to private location-based 

services. ACM TODS, 36(2), article 10, 2011. 



48 



